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MISINFORMATION, RELIABILITY AND ITEM DISCRIMINATION 
INDICES ON MULTIPLE-CHOTCE TESTS 


ABSTRACT 


Robert B. Frary and Stephen R. Lowry 
Virginia Polytechnic Institute 
and State University 


This paper presents theory concerning the relationships between 
reliability, misinformation and item discrimination coefficients, It 
is shown that, to the extent that misinformation rather than fpnorance 
_ causes examinees to miss multiple-choice items, higher ftem diseriminatlou 
coefficients and lower difficulty indices may be expected. Data were 
collected which partially confirmed the prevalence of these outcomes in 
typical college classroom testing situations involving six tests and 210 
examinees. The implications of the findings are discussed with respect to 
commonly used test construction procedures. Specifically, a caution is 
voiced concerning possible biasing of tests to penalize misinformation 
moreso than ignorance when this approach is inappropriate. 


Misinformation, Reliability and Ltem Discrimination 
Indices on Multiple-Choice Tests 


Robi rt B. Frary and Stephen R. Lowry 
Virginia Polytechnic Institute and State University 

The concepts of misinformation and ignorance are critical for interpreting 
the results of testing. In academic testing, for example, prevalence of low 
scores due to nisinformatlon suggests an entirely different approach to remedi- 
ation than low scores resulting from ignorance. In srbiicewtonal licensing 
examinations, misinformation probably represents much stronger grounds for 
denial of licensing than does ignorance. An active medical practitioner may 
seek consultation with colleagues when in doubt regarding treating a patient. 
In contrast, a misinformed practitioner may make a fatal mistake. 

On a multiple-choice test question, it is plausible to define ignorance 
and misinformation according to an examinee's strategy or behavior in answering 
the question. When the examinee does not know the answer with a substantial 
degree of certainty, yet intends to answer it nevertheless, the following 
behavior is hypothesized. First he eliminates choices he beiieves to be wrong 
with a substantial degree of certainty. Then he guesses among the remaining 
choices. If the eventual choice is incorrect, this outcome may be categorized 
as being due either to ignorance or to misinformation on the following basis: 


Ignorance. The right answer was one of the choices among which the 
examinee guessed. 


Misinformation. The right answer was one of the choices eliminated with 
a substantial degree of certainty. 


It will be shown that item selection procedures typically employed in test 


development tend to bias tests so that low scarers are Likely to display 


misinformation moresy than ignorance. 


Item Difficulty 


A typical problem in Lest development is finding good, yet difficult, Leems. 
In order to maximize internal consistency for a test designed to measure a 
wide range of performance, a substantial proportion of difficult items is 
required, that is, ‘ieee caeauwead correctly by less than half of the examinees. 
Items of this sort are likely to reflect misinformation moreso than ignorance. 
This statement follows because, in the absence of misinformation and with a 
small proportion of examinees knowing the answer with assurance, guessing success 
may make the item appear to be of only medium difficulty. The following hypo- 
thetical response proportions illustrate this point for four items, each of 


which is known with assurance by only one-tenth of the examinces: 


ee eee 


ae ate «al 4 
A. Proportion knowing answer with ol Fpl ol «1 
assurance 
B. Proportion with misinformation .0 0 oh 4 
C. Proportion of ignorant examinees 9 9 5 5 
D. Average probability cf correct 12 «4 2 24 
guess for ignorant examinees 
E. Proportion of correct answers 18 36 ol az 
from guessing (C x D) 
F. Total proportion correct (A + E) .28 46 .20 30 


In seeking difficult items, there would be a substantial tendency to 
choose items like 3, which have highec proportions wrong due to misinformation. 
However, items like 1 and 3, on which guessing would be essentially random 
(assuming five choices), are probably quite rare. Difficult items are more 
likely to ba selected from ones Like 2 and 4, on which probability of a correct 
guess is higher. Again the items vedeeunntiax nore misinformation seem more 
likely to be chosen on the basis of difficulty. 


Item Correlation with Total and Criterion Scores 


Other than difficulty level, the main basis for selecting one item over 
another is correlation with total score or a criterion. Again, items representing 
misinformation are more likely to appear superior in this respect. To under- 
stand why this result follows, consider two items, both of which can be answered 
with assurance by high scorers on the test itself or the critcrion. In contrast, 
low scorers on the test or criterion are ignorant on one item and misinformed on 
the other. The item representing ignorance will be answered correctly by a 
proportion of the ignorant low scorers, thus decreasing its correlation with 
total score or the criterion in comparison with the other item, which is 
answered incorrectly by all of these same lew scorers. 

Empirical Investipation 

Determination of the validity of the theory developed above requires 
knowledge for each item of a test in the preliminary stages of development of 
the proportion of examinees with misinformation. Then acress test items under 
consideration for retention this proportion should correlate with item selection 
statistics. 

Data for the study came: from six college-level biology tests consisting 


of 20 to 35 4-choice items. The teste were administered to three groups (two 
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tests per group) of approximately 40 to 80 students each. Students were 


instructed to respond according to the mode suggested by Coombs and others 
(1956) in which examinees mark the choices they believe incorrect with a 
strong penalty for inadvertently marking the currect answer. Course prades 
were assigned on the basis of this scoring procedure, so that students sheuld 
have been strongly motivated to avoid marking choices about which they were 
unsure. (Examinees received cne point for each correctly identified wrong 
answer and a three point penalty for marking the right choice.) In addition, 
examinees recorded their order of elimination of wrong cholces and their best 
guesses as to the correct answers to produce the usual uumber-ripght seores. 

. Whenever an examinee inadvertently marked a correct choice as Lucorreect, 
it was assumed that he had misinformation to some degree. Further, it was 
assumed the earlier he marked the correct choice, the greater the degree 
of his misinformation, that is, the more confident he was that the right choice 
was wrong. However, an artifact in the response procedure tended to hias this 
measure. Examinecs apparently tended to eliminate an earlier appearing choice 
sooner than a Later one when there was an approximately equal degree of 
confidence that they both were wrong. This phenomenon was detected by 
noting a substantial correlation across items between choice position 
for wrong choices and mean order of elimination on all six tests. To 
correct for this bias, the mean order of elimination of right choices : 
was standardized as the deviation of this mean for each item From the 
Same mean over all items with the same choice position. Fur a procedural 
explanation of this standarizaticn, consider a hypothetical item for 
which the correct answer is the second choice. Perhaps ten examinees 


inadvertently eliminated this choice and their average order of elimination of 
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this choice is 2.1. For all items for which the second choice is the correct 


answer, the mean order of inadverient eliminations is then calculated to be 2.5. 


The deviation, .4, is then used to express the degree or intensity of mis- 
information associated with the item in question. Its positive value suggests 
that examinees were more prone than on other items to eliminate the right 
choice. 

The above procedures made possible computation of three measures of mis- 
information for each test item: 


AMIS: Proportion of examinces with misinformation, an absolute 
measure of misinformation. 


RMIS: Proportion of examinees missing the item who displayed 
misinformation, 2 relative measure of misinformation. 


IMIS: Deviation of nean order of elimination of right answer from 
mean order of elimination of right answer across all items 
with same chcice position for right answer, a measure of 
intensity of misinformation. 
For each item of the six tests, the following item quality indices were 
computed: 
DISC: [Stem discrimination coefficient (point biserial). 
DDEV: Deviation of item difficulty index from .5. 
QUAL: Overall item quality index 
fe(upev)® +186 "> 197 
as recomended by Davis (1964). 
Initial inspection of statistics from the six tests revealed a number of 
exceedingly easy items. The presence of a number of such items would have 
resulted in a very skewed distribution of the aumber of examtaces wlth 


misinformation (across items), since it is not possible to have a large 


proportion misinformed when very few miss the item. Also such items are 


Ne 


easily avoided in test development. Therefore, items answered correctly by more 


than 90 percent of the examinees were dropped from the study. The tests were 
rescored, and all statistics reported reflect only the restdaal Plems. These 
remaining items yielded test statistics as shown in Table 1. As the reader 
may judge, these statistics appear typical of those encountered in the early 
Stages -of test development. However, one statistic usually not available 

to test developers, correlation (across examinees) between number of items 
missed duc to misinformation and ae score was surprising. For each of the 
six tests, this correlation, though negative, was ef onfv moderate sfoe, supe 
gesting that misinformation did not reside so extensively among low scorers 
as assumed ia the discussion concerning discrimination coeffictents. Further 
computation revealeu that the correlations between proportion of items missed 
by each examinee due to misinformation and total score were near zero. 
Therefore, for the six tests under consideration, data analysis then focused 
on the extent to which item selecticn criteria might be related to misinformation 
in the absence of satisfying the assumption that cxtent of misinformation was 
strongly related to total score. . 

For each test intercorrelations among the three quality and three mis- 
information indices were computed (across items) as shown in Table 2. Correla- 
tions between AMIS (absolute proportion of misinformation) and DISC (discrimi- 
nation coefficients) are generally negative, that is, low discrimination 
coefficients (pobr items) seem to be associated with higher numbers of mis- 
informed examLlnees. This result is contrary to what was expected and probably 
reflects the fact that hipbatarnihin wis not substaatbally concentrated 


among lower scorers. In contrast correlations between AMIS and DDEV (absolute 
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value of deviation of item difficulty index from .5) are substantially negative 
as suggested by earlier discussion. This outcome reflects the Fact that a 
majority of items had difficulty indices above .5. As a result, those with 
high AMIS values usually had difficulty indices closer (yet still above) .5. 
This situation is partly artifactual in that tests with more difficult items 
might not have yielded this relationship. Yet it is realistic. Development 
of tests or item banks typically involves greater elimination of easy items 
than difficult, especilaliy if means of the netghborhood of 50 to 60 percent 
are desired. ‘The tocal effect on QUAL (Davis Ltem quallty index) of item 
difficulty and discrimination items with difficulty indices svheatadktality 
avove or below .5 tend to have high (poor) quality indices even when they 
have good discrimination coefficients. Accordingly the correlations between 
AMIS and QUAL are generally negative in spite of the negative correlations 
between ANIS and DISC. . 

Correlations between RMIS and IMIS and the three ristbey indices reveal no 
obvious patterns of statistical significance. A correlation of .44 (p < .1) 
for Test 2 between IMIS and DISC does suggest that items on which the mis- 
information is more intense have higher discrimination indices. Also, for 
Test 2, a correlation vf .47 (p < .05) between RMIS and DDEV is contrary to 
what might have been expected, suggesting that items with difficultics 
deviating moreso from .5 have higher relative misinformation measures. However, 
there is no significance over the 109 items of the study for any correlation 
involving RMIS and JMIS. 

Discussion 
The data of the study support the theoretical conclusions only in one respect, 


namely, that items with difficulty indices nearer .5 tend to represent higher 
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than average proportions of misinformatiun across examinees. Of course this 


outcome is at least partly due to a prepcnderance of items with difficulty 
indices above .5 among those available for analysis. Misinformation naturally 
reduces the number of cor-ect responses, making an otherwise easy item look 
better statistically. 

Nevertheless, the implications far test development are not trivial. In 
a@ practical sense, there is every reason to believe that items representing 
misinformation moreso than ignorance have a greater chance of selection from 
many item pools. Of course, the substantive nature of the items needs to be 
considered. For example, some tests, usually requiring numerical computation 
of answers, may have all wrong choices representing incorrect solutions arising 
from specific mistakes (misinformation), virtually requiring the ignorant 
examinee to guess at random or omit the item. 

In spite of and because of the somewhat inconclusive nature of what has 
been precented, there are substantial implications for further research. In 
addition to knowledge scores from tests, examinees might well earn misinformation 
and ignorance scores or perhaps a single additional score representing 
the ratio of misinformation to ignorance. Validity studies might then reveal 
the meaning of these scores with respect to a variety of criteria. 

Other research might investigate various approaches to neasuring mis~ 
information and ignorance. One approach might involve use of the "I don't 
know" responses used for varfjous standardized tests. The proportion of items 
missed from among those attempted might be related to misinformation and the 
number of "I don't know" responses to ignorance. Various refinements to 
the methods presented in this paper are also possible. For example, multiple 
forms of the tests might be used with random ordering of cholces within an 
item to avoid the problem of bias due to earlier misinformed elimination of 


right choices which appeared earlier in the list of choices for an item. 
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Table 1 


Statistics for Six Biology Tests 


i 2 3 4 5 6 
Number of Examinees 74 74 42 42 78 80 
Numbe: of Items! 19 20 18 oo 23 14 
Mean Item Difficulty 71 wid 74 259 52 -68 
Mecn Item Discriminacion 26 36 a2 fs HD 38 36 
KR-20 58 82 -47 67 .69 4? 
Mean Order of Elimlnation for Mis- 2.0 1.9 1.9 1.9 2.1 2.1 
informed Responses” 
Mean Proportion of Examineces per «12 -09 13 +20 25 ‘2a. 
ltem with Misinformation , 
Mean Proportion Per Item of Wrong 38 41 48 +50 56 79 
Choices Due to Misinformation : 
Mean Absolute Deviation of Ttem 22 -26 +24 16 16 922 
Difficulty Indices from .5 
Mean [tem Quality Index - 80 78 78 77 72 "79 
Correlation Between Raw Score and ~.41 ~.47 -.37 ~.45 ~.46 -.47 
Number of Items Missed Due to 
Misdnformir fon 
Correlation Between Raw Score and ~.07 -.13 -.17 ~.05 -03 -,01 
Proportion of Ltems Missed Due to 
Misinformation 


Thoes not include items with difficulty indices above .9, which were not 
used in the study (see text). 

“All items had four choices used with approximately equal frequency across 
items. Hence misinformed eliminations tended to oceur sooner than the choice 
position of the correct answer, 
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Table 2 


Correlations Between Item Quality 
and Misinformation Indices 


TEST ALL ITEMS 
i” .% 3 4 5 6 COMBINED 
Rubber OF 26Ct S25 BF neces Aue gia inaeiow abe get wats, ae, o: _ 109 
- ~ 45% -. 38% ‘s en =, 1Q** 
Tams, DISC 16 -08 -45 38 +08 ks 6 
—.76%% 9 =, 5] &* ~.BLe* =. 4Rx* ~. 35% =~. 44% ~~. 5je* 
TaMts, DDEV -76 wok 81 -48 ~35 -44 53 
~ * - - * = - ~ kk 
TaMIS, QUAL -55% 41% . 60% .08 eas _ 727 +26 
* “es —s taal) . ~e. * “es 
TRMIS, pIsc 24 10 34 12 iL 52 13 
TeMIS, DDEV -.34 47ke -.20 09 «13 -29 .02 
TRMIS,- QUAL -. 44% 42% ~.09 ak5 -07 41 -10 
IMIS, DISC ~.07 ~44% .06 -.0). -.U3 -~.18 10 
Tims, DDEV - 06 ~.16 09 29 17 ~42 +07 
- * we . 
Tits, QUAL 13 37 -18 2° . 15 oat Ol 


— 
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*Significant at the .10 level of probability 
*kSignificant at the .05 level of probability 
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